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Abstract 

Dynamics on networks is considered from the perspective of Markov stochastic processes. We partially describe the 
state of the system through network motifs and infer any missing data using the available information. This versatile 
approach is especially well adapted for modelling spreading processes and/or population dynamics. In particular, the 
generality of our systematic framework and the fact that its assumptions are explicitly stated suggests that it could be 
used as a common ground for comparing existing epidemics models too complex for direct comparison, such as agent- 
based computer simulations. We provide many examples for the special cases of susceptible-infectious-susceptible 
(SIS) and susceptible-infectious-removed (SIR) dynamics (e.g., epidemics propagation) and we observe multiple sit- 
uations where accurate results may be obtained at low computational cost. Our perspective reveals a subtle balance 
between the complex requirements of a realistic model and its basic assumptions. Keywords: contact networks, epi- 
demics, stochastic processes, complex networks, spreading dynamics, Markov processes 

1 Introduction 

Mathematical modelling has proven a valuable tool when addressing public health issues. The increase in availability of 
powerful computer resources has facilitated the use of agent-based models and other complex modelling approaches, all 
accounting for numerous parameters and assumptions HI 12 O . Our confidence in these models may increase when they 
are shown to agree with empirical observations and/or with previously accepted models. However, when discrepancies 
appear, the complexity of these computer programs may obfuscate the effect of underlying assumptions, making it difficult 
to isolate the source of disagreement. While analytical approaches offer more insights on the underlying assumptions, 
their use is often restricted to simpler interaction structures and/or dynamics. 

The purpose of this paper is to systematically model the global behaviour of stochastic systems composed of nu- 
merous elements interacting in a complex way. "Complex" here implies that interactions among the elements follow 
some nontrivial patterns that are neither perfectly regular nor completely random, as often seen in real-world systems. 
"Stochastic" implies that the system may not be completely predictable to us and that a probabilistic solution is sought. 

To this end, we present (Sec. [2]) a general modelling scheme where network theory HQ accounts for the interactions 
between the elements of the system and where a birth-death Markov process (6J models the stochastic dynamics. Since 
a tremendous amount of information may be required to store the state of the whole system, we seek the part of this 
information that is important for the problem at hand and then approximate the dynamics by tracking only this limited 
subset. Part of the discarded data may still affect, albeit weakly, the behaviour of the system. We fill this knowledge gap 
by inferring the missing information such that it is consistent both with the information we follow and any other prior 
information that is available to us. 

An important part of this paper (Sec. [3]) provides explicit examples to these general ideas. For simplicity, each case 
either corresponds to a susceptible-infectious-susceptible (SIS) or to a susceptible-infectious-removed (SIR) spreading 
processes, both standards in the study of infectious diseases propagation. While our first examples study simpler cases, 
facilitating the understanding of our systematic method, the later models show how the same approach applies to more 
complex interaction structures. 

We then compare and analyse the results of these examples (Sec. [4]). This reveals some general considerations for both 
the accuracy and the complexity of our modelling approach. We find that treating the inferences of missing information 
explicitly helps systematize the model development and highlights numerous possibilities for future developments. An 



1 



important simplification occurs for SIR spreading processes and related dynamics, leading to an exact model with a small 
number of dynamical variables. 

We conclude (Sec. [5]) on how our general approach may be applied beyond spreading processes, for example, popula- 
tion dynamics. Returning to the problem of understanding the source of discrepancies in complex models, we explain how 
modelling these models with our method could help identifying important assumptions and isolating the source of dis- 
agreement. Mathematical details and further generalizations are also presented in an Electronic Supplementary Material 
(ESM) Q. 



2 General modelling scheme 

We assume that the real- world system to be modelled is sufficiently well understood to implement a Monte Carlo computer 
simulation that approximately reproduces its behaviour. We refer to this hypothetical computer simulation as the full 
system: Z denotes the state of the full system {i.e., all the data that would be stored by the computer program) while V 
denotes the rules governing the evolution of Z in time (i.e., the program itself). 

However, there are many situations where a direct implementation of the full system is impractical due to storage 
and/or computation considerations. We thus design a simplified system that aims at reproducing the behaviour of the full 
one, while requiring less resources. 

The state X of this simplified system (much smaller than Z) evolves in time according to the rules W. Moreover, we 
note Y any known prior information that is relevant in a Bayesian inference of Z 

This last point is crucial: Y often makes the difference between an accurate model and a useless one. It bridges the gap 
between the simplified representation of the state of the system (i.e., X) and the full one (i.e., Z). 

Since we are interested in systems composed of many elements interacting through complex patterns, we express the 
previous quantities in terms of networks. 



2.1 Networks 

A network (graph) is a collection of nodes (vertices) and links (edges). Nodes model the elements of a system; links join 
nodes pairwise to represent interactions between the corresponding elements. Two nodes sharing a link are said to be 
neighbours and the degree of a node is its number of neighbours. The part of a link that is attached to a node is called a 
stub: there are two stubs per link and each node is attached to a number of stubs equal to its degree. A link with both ends 
leading to the same node is called a self-loop and repeated links occur when more than one link join the same two nodes. 

There may be systems such that specifying its state Z exactly amounts to specifying the network structure. However, 
most systems are not purely structural: they are composed of elements that, by themselves, require additional information 
to be properly characterized. Hence, we assign to each node a node state that specifies the intrinsic properties of the 
corresponding element in the system. Both the structure and these intrinsic node states are specified by Z; see ESM 
§1 for further examples of information that may be contained in Z, including the important case of directed networks. 



2.2 Motifs 

Specifying the complete structure of a complex network requires a tremendous amount of information. Since we want the 
state X of a simplified system to be of manageable size, approximations have to be made. A convenient way to do so, and 
one that has proven to give good results in the past [9l [lOl [TT1 [T21 H3l |T4]|, is to specify the network structure through 
its building blocks. 

A network motif is a pattern that may appear a number of times in the network. For example, two linked nodes form a 
pair motif while three nodes all neighbours of one another form a triangle motif. Motifs may encode intrinsic node states 
or other relevant information; further details and examples are provided throughout Section[3]as well as in ESM [7] §11. 

We define the state vector x of a system as a vector of integers specifying how many times different motifs appear in 
the network. These motifs may be attached together to form a network structure: the state vector X = x enumerates the 
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available building blocks while the prior information Y specifies how such blocks may be attached. There will usually be 
numerous valid ways to attach the blocks, some more probable than others. Given the available information, the resulting 
distribution is our best estimate for P(Z\ x, Y). 

By judiciously choosing the motifs enumerated in x and by specifying informative prior information Y, one may hope 
for this probability distribution to be densely localized around the "real" value of Z in the full system. This mapping can 
then be used to convert the rules V of the full system to the rules W of the new simplified one. We approach this problem 
from the perspective of birth-death Markov processes. 



2.3 Birth-death stochastic processes 



In a birth-death process, the elements composing a system may be destroyed (death) while new ones may be created 
(birth). It is therefore natural to state the rules W of our simplified system in those terms: any change in the state vector 
x may be perceived as an event where motifs are created and/or destroyed. 

Quantitatively, a. forward transition event of type j takes the system from state x to state x + r? and has probability 
g+(x, Y) dt to occur during the time interval [t, t + dt). Similarly, a backward transition event of type j takes the system 
from state x to state x — r J and has probability qj (x, Y) dt to occur during the same time interval. 

Specifying for each j the elements rj of the shift vector r° together with the rate functions (fy (x, Y) and qj (x, Y) 
thus completely define the rules W governing the simplified system. This Markov process is summarized in the master 
equation 



dP(x\ Y, t) 
dt 



q+ (x - r j , Y) P(x - r j \Y,t) - qj (x, Y) P(x| Y, t) 



(2) 



+ qj (x + r j , Y) P(x + r j \Y,t) - qj (x, Y) F(x| Y, t) 

specifying the evolution of the probability -P(x| Y, t) to observe state x at time t (notation compatible with [6] §7.5). 

We now consider two approximations that are often justified for large systems: the elements of x may be treated as 
varying continuously and the probability distribution is strongly concentrated around its mean value. In such cases, the 
evolution of the mean value = ^ x xP(x| Y, t) for the vector x at time t is approximately given by 



dfj,(t) 
dt 



(/*(*)) 



i(x) = XX' [qf(x,Y)-qJ(xi,Y) 



(3a) 
(3b) 



where we defined the drift vector a(x) of elements aj(x) (see [6] §7.5.3 and §4.4.9). 

In order to further refine our knowledge of P(x| Y, t) in the vicinity of this deterministic solution, we define the 
evolution matrix A(t, t'), the diffusion matrix B(x) (of elements Pjj/(x))and the covariance matrix C(t) 



A(t, t') = exp 



J a {fi(t"))dt" 



B iV (x) = 44 [<?/( x > Y ) + <h ( x > Y ) 

j 

C(t) = f A(t, t') • B(fj,(t')) • A(t, t'fdt' 
Jo 



(4a) 
(4b) 

(4c) 



where J a (x) is the Jacobian matrix of a evaluated at x. Noting d the size of the vector x, the probability distribution may 
be approximated by a ci-dimensional Gaussian 



P(x| Y, t) 



exp {-§ [x(t) - i*(t)] T C(t)- 1 ■ [x(t) - fi(t) 
(^) d \C(t)\ 



(5) 
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where |C(t)| is the determinant of C(t). Note that ([2])-([5]) are all textbook relationships. 

Although many other tools are available for the analysis of stochastic systems, the simplicity, the generality and the 
straightforwardness of the Gaussian approximation make it an instrument of choice that will be used extensively in this 
article. 



3 Application to spreading dynamics 

Without prejudice to the generality of Section[2| we now focus our study to spreading processes ifTSl IT6l ITTll . An epidemi- 
ological terminology is used: whatever propagates among neighbouring nodes, be it desirable or not, is called an infection. 
We find that the basic SIS and SIR epidemiological models, both to be defined shortly, require little prior knowledge from 
the part of the reader while being sufficiently complex for the needs of the present study. 

At a given time, the intrinsic state of each node of an SIS model may either be Susceptible (not carrying the infection) 
or Infectious (carrying the infection). The full system state Z hence specifies each node's intrinsic state together with the 
complete structure of the network. The rules V are simple: during any time interval [t, t + dt), each infectious node may 
recover {i.e., it becomes susceptible) with probability a dt and, for each of its susceptible neighbours, has probability /3 dt 
to transmit the infection (i.e., the neighbour becomes infectious). 

In addition to the susceptible and infectious intrinsic states, the nodes of an SIR model may also be Removed (once 
had the infection and can neither acquire nor transmit it ever again). The rules V are the same than for the SIS model with 
respect to infection (i.e., infectious nodes transmit to their susceptible neighbour with probability (3dt), but recovery is 
replaced by removal (i.e., infectious nodes become removed with probability a dt). 

The remainder of this section studies how different choices of state vector x and prior information Y translate in 
the rules W of the simplified system. In each case, W is defined through a set of equations whose tags all share the 
same numeral, e.g., (|6a|-((6f]>. Although figures present results concomitantly with the specification of the corresponding 
models, all discussions are delayed to Section[4j 



3.1 Pair-based SIS model 



Section 2.2 defined a pair motif as two linked nodes. Since the nodes of a SIS model are either susceptible or infectious, 
there are three possibilities for pair motifs: two linked susceptible nodes (noted S-S), two linked infectious nodes (noted 
/-/) and a susceptible node linked to an infectious one (noted S-I). Two nodes involved in a pair motif may have other 
neighbours. 

Pair motifs are often used in conjunction with node motifs: the trivial structure that is one node. In the SIS model, 
there are two possibilities for a node motif: susceptible nodes (noted S) and infectious nodes (noted /). A state vector x 
based on both node and pair motifs would thus be composed of five elements enumerating the amount of times each motif 
appears in the network: x$, xi, xss> %S-i an d xj-i. However, additional assumptions about the structure of the network 
may cause some of these quantities to be redundant. 



3.1.1 Degree-regular network 

We first consider the simple case where the network is known to be a n-regular network of size iV: there are N nodes in 
the network which all have n neighbours (degree n). Such a network must respect the structural constraints x$ = N — xi, 
xss = 2 ( nx s — %S-i) and = \ (nxi — xs-i)- Hence, with the prior information Y specifying N and n, the state 
vector 

x = (xi,x S -i) (6a) 

suffices to obtain all the five node and pair motifs. 

In those terms, the rules V specify that an infection has probability /3 x$-i dt to occur during the time interval [t, t+dt) 
while a recovery has probability a xj dt to occur. Clearly, an infection translates to the destruction of a S motif and the 
creation of a new / one, and a recovery corresponds to the inverse process. However, pair motifs are also affected by 
such transitions since the affected node had neighbours. Hence, the effect on x of the infection or recovery of a node 
depends on some information that is not directly tracked by x — i.e., what is the state of the infected or recovered node's 
neighbours — and we thus have to infer this information from the available data. 
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In order to facilitate this inference, we define the first neighbourhood motif ST\(ks, kj) as a susceptible node that has 
ks susceptible neighbours and kj infectious neighbours. Similarly, the motif ITi(ks, kj) corresponds to an infectious node 
with ks susceptible neighbours and kj infectious ones. In both cases, we qualify as central the node whose neighbours 
are explicitly stated. The other nodes of the first neighbourhood motif, i.e., the neighbours of the central node, may have 
other neighbours of their own. 

We can now define a forward transition event of type j G {0, 1, . . . , n} as the infection of the central node of a 
STi(n — j, j) motif. In terms of node and pair motifs, this implies the destruction of one of the S motifs, of n — j of the 
xss motifs and of j of the xs-i motifs together with the creation of one new / motif, of n — j new xs-i motifs and of j 
new xj-i motifs. Since only xj and xs-i are tracked, the shift vectors are 

H' = (l,n-2j) . (6b) 

This same vector also defines the backward transition events j 6 {0,1,..., n} which correspond to the recovery of the 
central node of a iTi(n — j, j) motif. 

Looking back at the rules V, the corresponding forward and backward transition rate functions are 

q+(yL,Y) = Px s _iP(Sr l (n-j,j)\S,j >l,x,Y) (6c) 
qj (x, Y) = a xj P(/Ii(n - j, j)\I, x, Y) (6d) 

where two inference terms have been defined. 



The inference term of ( |6d| > gives the probability for a motif to be a IT\(n — knowing that it has an infectious 
node at its center and that the current state vector is x with prior information Y. For a sufficiently large network, this 
approximately corresponds to randomly drawing the n neighbours of the central infectious node among the pair motifs 
S-I and I-I 



n \ / x S -i V 1 xs-i 



P(IT 1 (n-j,j)\I,x,Y)=r: [^ 1-— • (6e) 

\j ) \nxi ) \ nxjj 

The inference term of ( [6c] ) is very similar except that the central susceptible node is known to have at least one infectious 
neighbours since it acquired the infection through a S-I motif 

pm.»-i,mi > i,x,n = (" j >) (^^y) J (i - ^h^T^ «*> 

which complete the rules W for the pair-based SIS model on a n-regular network of size N. 

Figure [2] compares the results produced by this simplified model (defined by W, x and Y) to the corresponding full 
one (defined by V and Z). Figure [3] shows the probability distribution for the same data. Note that, although presented 
differently, this model corresponds to the one presented in lfl"8lk Fig.[T]is provided for comparison with Fig. 2(c) of [ 18]. 

3.1.2 Erdos-Renyi network 

We now consider the case where the network is an Erdos-Renyi network: there are N nodes in the networks and M 
links are randomly assigned. This knowledge constrains two of the five node and pair motifs (i.e., xs = N — xi and 
xss = M — xs-i — xi-i) and a state vector of three elements suffices 

x = (xi,x S -i,xi-i) . (7a) 



The method used in Section 3.1.1 has to be adapted since the degree of each node is not constrained to a single value. 
Indeed, a susceptible node that gets infected may a priori be the center of any of the ST\(ks, ki) motifs. Still, we could 
design a bijective mapping between the vector of integers k = (ks, kj) and an event type j. 

The details of the chosen mapping do not matter: we simply define the forward transition event of type k as the 
infection of the central node of a STi(ks, kj) motif, which may conveniently be noted STi(k) instead. Similarly, the 
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Figure 1: (Online version in colour.) Distribution of post-transient (t — > oo) outcomes as predicted by the SIS model on 
regular network ([6]) using the approximations ((3])-((4]). The axes (proportion of / among node motifs vs proportion of S-I 
among pair motifs), network structure (N = 10 5 nodes, each of degree n = 5) and parameters (a = O.l^jand (3 = 0.05) 
are the same as for Fig. 2(c) from [18]. Frequencies (in percent) are used to facilitate comparison with [18]; they are 
simply obtained from the probability densities of ([5]) multiplied by 100. 




Figure 2: (Online version in colour.) Time evolution of the number of infectious nodes x\ for SIS dynamics (a = 2 and 
f3 = 1) on a regular network of N = 10 3 nodes (20 initially infectious) of degree n = 5. Curves: results for the simplified 
system ([6]) approximated with Q-Q. The continuous curve shows the mean value while the dashed curves delimit the 
range of one standard deviation above and below the mean. Symbols: averaged results of 10 5 numerical simulations of 
the full system. The parameters a and (3 correspond to those of Fig. [Tj after rescaling the time unit. 




Figure 3: (Online version in colour.) Probability distribution at different times for the number of infectious nodes xj. All 
parameters are the same as in Fig. [2] Curves: Gaussian approximation for the simplified system. Symbols: binned results 
of 10 5 numerical simulations of the full system. 
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Figure 4: (Online version in colour.) Time evolution of the number of infectious nodes xj for SIS dynamics (a = 2 and 
(3 = 1) on an Erdos-Renyi network of N = 10 3 nodes (20 initially infectious) and M = 5 ■ 10 3 links. The mean and 
range of one standard deviation above and bellow the mean are shown. Curves: simplified system. Symbols: full system 
(10 5 simulations). 

backward transition event of type k is defined as the recovery of the central node to a iTi(k) motif. The corresponding 
shift vector and rate functions are 

r k = (l.fcs-fe/.fe/) (7b) 
q+ (x, Y) = P xs-i P(STi(k) \S,ki>l,x, Y) (7c) 
q k (x, Y) = a xj P(/r\(k) | /, x, Y) (7d) 

where the inference terms bear the same meaning as their previous counterpart. Again assuming large network size, these 
inference terms are obtained by evaluating the probability for each pair motif to include the considered node. Hence, the 
products of binomial distributions 

P(STik)\S,l> l,x,Y) = ( 2 ^ S )*5 fcs (l -x s 1 ) 2xs - s - fcs ^ / _" i 1 )x 5 (fc ^ 1) (l (7e) 

P(iri(k)| I, x, Y) = ( X ^) xj k * (1 - xj 1 )'"^ ( 2 X ^A xj k < (1 - xj 1 ) 2 *""*' (70 

complete the rules W for the pair-based SIS model on a Erdos-Renyi network of size N with M links. 

Figure [4] compares the results produced by this simplified model to the corresponding full one. Section 4.2 discusses 
these results and provides further details concerning pair-based models. 

3.2 First neighbourhood SIS model 

We consider a full model (V and Z) for SIS dynamics on a configuration model (CM) network: given a sequence 
{no, m, ri2, ■ ■ ■ }, links are randomly assigned between nodes such that, for each degree n, there are n K nodes of de- 
gree k. In a computer simulation, we create n K nodes with k stubs for each possible k and then randomly pair stubs to 
form links. No particular mechanism is used to prevent the formation of repeated links and self-loops: this simplifies the 
analytical treatment and has little effect when the network size is sufficiently large. 

Our simplified model handles the heterogeneity in node degree by enumerating every possible first neighbourhood 
motifs in its state vector 

x = (a?sr x (o,o) i ^sthi.o) ,"' , x m(o,o) ,'■■) ■ ( 8a ) 

Although this vector should be infinite in the general case, it is not the case when, e.g., the prior information Y states that 
no node has a degree superior to K. 



For the same reasons that models tracking node and pair motifs (Section 3.1 1 had their transition events defined in 
terms of first neighbourhood motifs, the transition events are here defined in terms of second neighbourhood motifs: 
a central node, its neighbours and the neighbours of those neighbours. In the same way that we note ^li(k) the first 



neighbourhood motif formed by a state v central node with neighbourhood specified by k, we note ^^(K) the second 
neighbourhood motif formed by a state v central node with neighbourhood specified by K. 

The elements of K may be indexed with first neighbourhood motifs: the central node has K^ju^n state v' neighbours 
whose other neighbours (i.e., excluding the central node) are specified by k'. Hence, the second neighbourhood motif 




Hl(0,i) 



1 and K 



<STi(l,l) 



2. Note that the 



is noted STgQK) with all elements of K zero except for K SI ^ Q ^ = 1, K 
central node of this second neighbourhood motif is also the central node of the first neighbourhood motif SIi(3, 1). In 
general, we note uTi(K.) the first neighbourhood motif that shares the same central node as the second neighbourhood 
motif j/r 2 (K). 

We digress further to introduce the unit vector notation e» where M. represents a motif; all the elements of this 
vector are zero except for the .M-th, which is one. The total number of elements in er\4 should be clear from the context. 
As a concrete example, the right hand side of (6b ) could be noted ej + (n — 2j)es-i- 



Similarly to Section 3. 1.2 we define the forward transition event of type K to be the infection of the central node of 
a 5T2(K) motif and the backward transition event of type K as the recovery of the central node of a /^(K) motif. The 
corresponding shift vector is 



e /ri(K) 



e sn(K) 



V k 



(k) (e^rKk+e,) - e^r l(k+ a s ) 



) 



(8b) 



The first line shows the direct effect of a change of state in the central node while the second one handles the "collateral 
effect" on its immediate neighbours. Here the unit vector e v has the same dimension as k (i.e., two) while e„jvk) has the 
same dimension as x. Sums are taken over all the accessible values of v and k. 
The corresponding rate functions are 



g+(x, Y) = Px sfi(K) (^K im) )p(ST 2 (K) SIi(K),x, Y 
q- (x, Y) = a x /Fi(K) p(/T 2 (K) jfi(K), x, Y 



(8c) 
(8d) 



Note that, unlike (|6c[)-(|6d|) and (|7c])-(|7dj), the inference terms in (|Sc[)-(|8d|) have the same form: the probability for a motif 
to be a vtyK.) knowing (in addition to x and Y) that its central node is also the central node of a vY\(K.) motif. Again 
assuming a large network size, they are provided by a product of multinomial distributions 



P[vT 2 (K) 



^l(K),x,y) =nfe^ri(k))!ll 



1 



(k u + l)avri(k+e„) 



, (^'n(k))!l EK*v>m') 
N k' 



(8e) 



which complete the rules W for the first neighbourhood SIS model. 

Figure[5]compares the results produced by this simplified model and the full one. Note that this is a stochastic version 
of the model presented in ifTTTl . except that the network structure is here static. 



3.3 First neighbourhood SIR model 

As in Sec. |3.2| we consider a full network model where the network structure is specified solely by the degree of its nodes. 
However, this time we consider SIR epidemiological dynamics: the accessible node states are v G {S, I, R}, infection is 
the same as in SIS but recovery is replaced by removal (see the introduction of Sec.[3]for details). 

We define the forward transition event of type XK to be the Xnfection of the central node of a ST^K.) motif while 
a forward transition event of type 1ZK is the Removal of the central node of a /^(K) motif. There is no backward 
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Figure 5: (Online version in colour.) Time evolution of the number of infectious nodes for SIS dynamics (a = 1 and (3 = 
1) on a CM network for which the number of nodes of each degree is prescribed by the sequence {0, 50, 200, 450, 300} 
(total N = 10 3 nodes) with 2% of the nodes of each degree initially infectious. The mean and range of one standard 
deviation above and bellow the mean are shown. Curves: simplified system. Symbols: full system (10 5 simulations). 



transition events. The model is specified by 

X = (' ' ' > X STl(k) ) • ' ' ) X /Il(k) 5***5 Zfliyic) 5 * * * ) 

r IK = e /Fi(K) - e s p i(K) + ^ S K "DQt) (^r^k+ej) - e^k+es)) 



V k 



r^ K — p ~ — p ~ 

i?ri(K) /ri(K) 



^ ^n(k) ( e ^ri(k+s fl ) - e i/ri(k+Sj)) 



f k 



g+ K (x, Y) = Px sfim ^2K mk) )P[ST 2 (K) SF^K), x, y 

g + K (x, y ) = a x jFi(K) p (/r 2 (K) /fi(K) , x, y 



?ZK( X ' y ) = ^K( X ' y ) = 

where the inference terms are the same as in 



(9a) 
(9b) 

(9c) 

(9d) 

(9e) 
(9f) 



3.4 First neighbourhood on-the-fly SIR model 



We take a different perspective to the problem considered in Sec. 3.3 which requires to track much less elements in the 
state vector. Instead of considering "complete" first neighbourhood motifs, such as z/Ii(k), that specify the state of each 
of the central node's neighbours, we define the uAx(k) motif as a central node of state v for which we know that it has k 
neighbours unknown to us. This last statement is important: were we to learn the state of one of these neighbours, this 
would cease to be a z/Ai(k) motif and instead become a vK\{k — 1) one. As usual, the state vector tracks the number of 
such motifs 

x = (* * * )»SAi(k)j * * * > x /Ai(k)> * * * >&RKt(K)> * * * ) • (l° a ) 



We recall from Sec. 3.2 how a CM network is built in a computer simulation: for each k, n K nodes with k stubs are 
created and the stubs are then randomly paired to form links. From this perspective, uAx(k) may be reinterpreted as a v 
state node with k unpaired stubs: as stubs are removed once they are paired in the computer simulation, neighbours that 
were unknown are removed from these motifs once they become known to us. Hence, 



EE «"aV'Ai(K") - 1 



exactly gives the probability for an unknown neighbours of the central node of v'Ai(k') to be the central node of vAi(k). 
Note that the Kronecker deltas (Su* = {q S/ ) in the numerator and the —1 in the denominator both account for the 
stub of v' A\(k') that we are pairing with a random stub. 
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Figure 6: (Online version in colour.) Time evolution of the number of infectious and removed nodes for SIR dynamics 
(a = 1 and j3 = 1) on a CM network for which the number of nodes of each degree is prescribed by the sequence 
{0, 50, 200, 450, 300} (total N = 10 3 nodes) with 2% of the nodes of each degree initially infectious (all others are 
susceptible). The mean and range of one standard deviation above and bellow the mean are shown. Curves: simplified 
system. Symbols: full system (10 5 simulations). 

A typical computer simulation would first build the network and then perform the SIR propagation dynamics on this 
network. However, we do not want to have to store the network structure for later consultation, which would require 
additional space in x. Instead, we delay the network construction, leaving the stubs unpaired, and start the propagation 
dynamics right away. Just when the state of an unknown neighbour is required do we pair the corresponding stub with a 
randomly selected one, hence building the network on-the-fly. Since the knowledge of stubs being matched will be lost in 
the future, this information must only be required at the very moment it is obtained if we want the resulting dynamics to 
exactly reproduce the behaviour of the full system. 

We thus take a different, although equivalent, perspective on the infection dynamics where each link is "probed" at 
most once. Instead of considering a probability (3 dt of infection for each susceptible neighbours of infectious nodes, 
we consider the same probability for each of their unknown neighbours. Only when this probability returns true do we 
wonder about the state of the neighbour, whose state changes to infectious if and only if it was previously susceptible. In 
any case, we learned who were the neighbours of two nodes (i.e., the infectious and its neighbour) and we must update 
the state vector accordingly. 

Hence, we define the Xnfection transition event Iukk' such that an infectious at the center of a /Ai(k') motif attempts 
to infect the z^-state node at the center of a vAi(k) motif. Of course, only XSkk' transition events result in real infections. 
The more traditional transition event TZk corresponds to the Removal of the infectious node at the center of a JAi(re) 
motif, thus becoming i?Ai(«). The model is specified by 

t xsk.k _ e /Ai ( K _ 1) - e 5Al ( K ) + erAi(re'-i) - e/Ai(/«') (10b) 

r KK = e /Ai( K -l) - e /Ai( K ) + e/Ai(re'-l) ~ e /Ai( K ') (10c) 

r XRKK _ e iiAi ^_ 1 j - e RAl ^ + e /Al ( K /_ 1) - e JAl ( re /) (lOd) 
r 7 ^ = e RAl{K) - e IAl{K) (lOe) 



Q±u KK '(^ Y ) = / 3k x iH*') w k»t ~, — T 



(100 



9^«( x ' y ) = qx /Ai(k) (!°g) 

w(x,^) = ^(x,y) = o . doh) 

The system ( [T0| exactly reproduces the behaviour of the full system through the solution of Q. Since Q-Q are 
only approximations of Q, results obtained through these relationships are only approximative (Fig. [6] and Fig. [7]). This 
model may be solved analytically for the mean value (see ESM [7] §111) and the results are in agreement with |fl~9ll20ll . 



Moreover, ESM [7] §IV shows how (10 1 may be rewritten with a state vector two thirds the size of (lOai. This is a 



generalization to the case a ^ of the model presented in [14]. Further details are discussed in Section 4.4 We note that 
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Figure 7: (Online version in colour.) Probability distribution at different times for the number of infectious and removed 
nodes. The parameters are the same as in Fig. [6} Curves: simplified system. Symbols: full system (10 5 simulations). 

a conceptually similar approach has recently been developed independently ET1 as a tool for a mathematically rigorous 
proof that a specific heterogeneous mean field model [ 19 ] holds in the limit of large network size. 

4 Discussion 

We now take a retrospective look at the results presented in Sec. [3] and obtain from these special examples general 
considerations concerning our modelling approach. 

4.1 Accuracy of the results 

One of the aims of this paper is to obtain simplified models that accurately reproduce the behaviour of complex systems. 
Since approximations are usually involved, it is to be expected that the results of the simplified model only agree with 
those of the full system over some range of parameters, where the approximations were valid. 

The parameters used in Fig. [2]-[7] were chosen in order to investigate the limits of our approximations: while there is 
no perfect correspondence between the results of the full and simplified systems, their agreement is probably sufficient for 
both qualitative and quantitative applications. We distinguish between two categories of approximations: those inherent 
to the use of @-((4]) and those due to the imperfect representation of Z through x and Y. 

4.1.1 Gaussian approximation 

Since Q and ( fl"0| ) define a system that exactly reproduces the behaviour of the corresponding full system, any discrep- 
ancy in Fig. [6] must originate from the use of the Gaussian approximation Q-Q. An important requirement for this 
approximation to be valid is that the size N of the system must be large. 

Figures [2]-[7] all use networks of size N = 1000. As a rule of thumb, we found that Q-Q perform better for networks 
of at least a few hundred nodes, which is the case of many relevant real-world systems. Note that, for very small systems 
(tens of nodes), one could also directly and completely solve ([2]). 

While a large network size N is required to justify treating the elements of x as real numbers, other phenomena may 
affect the validity of this approximation. For example, when the initial conditions are such that there is a single infectious 
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node, the continuous approximation fails at considering the probability for that node to recover (or to be removed) before 
transmitting the infection to one of its neighbours. Figures[2]47]circumvent this problem by using an initial condition with 
20 infectious nodes: the probability for all of them to recover (or to be removed) before transmitting the infection is very 
low. 

It is worth noting that the plateaux seen on Fig. [2] and [4]-[6] reflect different dynamical behaviours for the SIS and SIR 
systems. Indeed, while the total number of removed nodes reaches a maximum in the SIR system because there are no 
infectious left to recover, the steady state observed at the later times for our SIS models corresponds to a constant flow of 
recovery and new infections. In the former case, the approximation errors performed at earlier times accumulate. In the 
later, the exact path taken to attain equilibrium is of lesser importance and errors do not accumulate the same way. 



4.1.2 Representation approximation 

In general, the simplified system will not exactly reproduce the behaviour of the full system, even when using (|2]) instead 
of Q-Q. This is the case of all our SIS models; while some of the discrepancy seen in Fig. [2]-[5] is explained by the 
Gaussian approximation, the imperfect representation of Z also contributes to the error. 

Part of the problem can be understood as our failure to consider the correlation between the neighbours of a node and 
the time elapsed since this node has been in its present intrinsic state. For example, the neighbours of a susceptible node 
that has just recovered (i.e., it was infectious a moment ago) may be much different than those of a susceptible node that 
has recovered a long time ago, while being similar to those of a node that is still infectious. Hence, one could hope to 
improve these SIS models through changes in Y alone (i.e., with the same x): first estimate the probability distribution for 
the time since when each node has last changed state and then infer the neighbourhoods accordingly. An alternative that 
could be simpler to implement, at the cost of increasing the size of x, would consist in tracking more exhaustive motifs 
(e.g., second neighbourhoods instead of first ones in Sec. 3.2). 

However, there are more intricate consequences to the recovery of infectious nodes on a structure that is fixed in time: 
if at some point all the nodes of the same component (i.e., a connected subnetwork that is disconnected from the rest of 
the network) are susceptible at the same time, then none of them may ever become infectious again. The connectivity of 
a network is strongly affected by the average degree of its nodes: our parameters correspond to an average degree of 5 
for Fig. [2]-[4] (average degree of a neighbour also 5) and of 3 for Fig. [5] (average degree of a neighbour rj 3.23). When 
using smaller parameter values, this components-induced discrepancy becomes much larger since the simplified model 
then overestimates the number of infectious nodes. One could take the components into account by solving independent 
systems for each component (and merge the results afterwards) or by a clever adaptation of the inference process (see 
Sec. 4.6 for possible directions). Note that these effects are usually much less important when the network structure 
changes over time. 



4.2 Pair-based models 



Compared to the other models presented in Sec. [3] the two pair-based models of Sec. 3.1 use very small state vectors (i.e., 



two or three elements). This is an important advantage of pair-based models in general: there are usually much less pair 
and node motifs than, e.g., first neighbourhood ones, and tracking them thus requires much smaller x. 

Although we limited our study of pair-based models to regular and Erdos-Renyi networks, more complex network 
structures could also be considered. In the same way that ([6]) and Q differ mostly by their inference terms, obtaining 
good inference from the little information stored in x is probably the principal challenge behind general and accurate 
pair-based stochastic models. 

However, non-stochastic pair-based models are already possible on nontrivial network structures for SIR dynamics or, 
more generally, for processes such that a change in the state of one neighbour of a node can be treated as independent of 
that of another neighbour (SIS fails this assumption) [22]. Knowing (in 1") that a system behaves in this manner greatly 
simplifies the inference process, and this is the main reason for the success of the SIR pair-based model for the evolution 
of mean values on CM networks that is presented in IIT91I201 . Whether or not the same approach may be used to obtain 
stochastic results is an open question. 
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4.3 First (and higher) neighbourhood models 



By opposition, sufficiently accurate inference terms for first neighbourhood models are often straightforward to obtain. 
Although ( [8e] ) may be difficult to appreciate at first sight, it is the only inference term used in both Sec. 3.2 and Sec. 3.3 In 
fact, ( [8e] > may well be the only inference term needed for generic first neighbourhood models for CM network structures. 

Although first neighbourhood motifs are a "natural language" for expressing dynamics taking place on CM networks, 
they could also be used in the presence of other complex structures. This may be done through changes in x and/or Y; 
see ESM §11 for details. 

The generality and ease of design of first-neighbourhood models comes at a cost: the state vector x is typically much 
larger than it would be in an equivalent pair-model. How large is x strongly depends on the maximal node degree present 
in the network and on the total number of accessible intrinsic node states (see ESM [7] §11 for details). For typical 
values of these quantities, this does not cause major problems for the evaluation of the mean: numerically solving ( |3a| 
requires an acceptable amount of resources even for an x of dimension 10 6 and (3b I may often be simplified (i.e. , summed 
analytically). 

However, evaluating the covariance matrix using ([4]) may cause problems: unless analytical simplifications are pos- 
sible, solving this system scales as the square of the number of elements in x. Future developments may decrease this 
bottleneck effect of the covariance matrix; see Sec. 4.5 for details. In any case, the size of x may be decreased by "coarse 
graining" the number of links between the central node and its neighbours; see ESM [7] §11 G for details. 



4.4 On-the-fly models 



The on-the-fly model presented in Sec. 3.4 for SIR dynamics on CM networks exactly reproduces the behaviour of the full 



system. This is even more remarkable when one considers that the size of the state vector in the on-the-fly model is much 



smaller than in the alternative first neighbourhood model of Sec. 3.3 The reasons behind the success of the on-the-fly 
approach are similar to those discussed in Sec. |4.2| for the pair models presented in |[22l [T9l 1201 : it is encoded in Y that, 
for each link, we at most once need to simultaneously know the state of the two nodes joined by that link lfl4l . 

The inference term ( |8e"| ) is of "general purpose" in the sense that its Y does not provide information on the dynamical 
properties of the system, but only on how the motifs in x may be interconnected. This is why both ([8]) and ([9]) rely on 



<8e> 



However, the inference terms of (jT0j> have a specific character: Y contains information about ( |T0| > itself. Any change 
to the dynamics implies changes in the inference terms, with no guarantee that an acceptable solution exists. In fact, 
([TO} was designed with this problem in mind. In other words, we obtained a simple and reliable model at the cost of 
"pre-computations" in the design process. Of all the possibilities in model-space, the information acquired by pointing 
at this specific one is what replaces the reduced size of the state vector. The same could be said of the deterministic SIR 
pair-based model on CM networks presented in |[T9ll20l . 

By contrast with the case discussed in Sec. |4.3| the small size of the state vector here allows for evaluations of the 
covariance matrix through ([4]), even when relatively high degree nodes are present. Alternatively, one may take advantage 
of the fact that, even for more complicated dynamics, the state vector of on-the-fly models can remain of manageable size 
for mean values calculations; see the introduction to ESM [7] §11 for the concrete example of lTT3l . 



4.5 Complicated states vs complex assumptions 

Section [44] revealed an unexpected depth to Y: one may achieve models of similar levels of accuracy by trading off 
complexity in the assumptions for a reduction in the size of the state vector x. As an extreme example, if Y already gives 
the full behaviour of the system, then there is no need for tracking any information in x. Without reaching such extremes, 
our on-the-fly model and the deterministic SIR pair-based model presented in |[T9l l20l both demonstrate the benefits of 
investing some time in the assumptions of our models. 

While these examples required case-by-case analysis, one may benefit from the same realization in a general context: 
a first simplified model (W, x and Y) may generate the assumptions Y' to a different simplified model (W, x' and 
Y'). For example, when some dynamical process {e.g., SIS or SIR) occurs on a network whose structure changes in 
time independently from this dynamics, one could obtain a first model for the structure alone and then feed the results 
to the second model, handling the remaining dynamics. Even more generally, one could compensate for the higher 
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computational requirements of Q by first solving Q on an elaborate model then feeding the resulting mean values to a 
simpler model for the sole purpose of estimating the covariance matrix. 



4.6 Additional inference tools 

While we introduced Y as a direct application of Bayes' rule, we have now seen that useful assumptions may be obtained 
by other means, including the solution of another system of the form Q. The next step in this direction would be to 
improve our inference process using alternative tools and models available to network science. 

For example, branching processes ll23l may be used to infer information concerning the connectivity and the compo- 



nents of the network structure. As discussed in Sec. 4. 1.2 this point was a major shortfall of SIS models. This approach is 
even more interesting for the recently developed tools |[24l l9l that are particularly compatible with the motifs and intrinsic 
node state approach presented in this paper. 

Another tool of considerable interest are exponential random networks ll25l . Indeed, these maximum entropy meth- 
ods can simplify inferences that would have otherwise been prohibitively complex. Once again, this approach may be 
generalized to different kind of motifs and intrinsic node states. 



5 Conclusion: general applicability 

Although the examples of Sec.[3]focus on SIS and SIR dynamics, any specificity that could be modelled through a standard 
epidemiological compartmental model may a priori be considered by our approach: genders, age groups, vaccination, 
incubation period, disease phases, etc. Each compartment simply becomes an accessible node state in our formalism; see 
ESM §1 for details. 

Furthermore, population dynamics considerations may be accounted for in a straightforward manner. Assuming first 
neighbourhood motifs, births and deaths of individuals correspond to events adding and removing motifs, respectively. 
Similarly, changes in interaction patterns amount to events replacing the affected motifs by new ones. In fact, from the 
model's perspective, there is no important distinction between a change in the interaction structure of the population and 
a change in the node states: both are events affecting motifs. 

The generality of our systematic approach and the fact that its assumptions are explicitly stated suggests that it could 
be used as a common ground for comparing existing models too complex for direct comparison. Indeed, by considering 
such an existing model as the full system (specified by V\ and Z\), one may seek a simplified system (specified by W\, 
X\ and Y\) approximately reproducing the original model (over a sufficient range of parameters). 

If some transition event (in W\) appears essential, this may reveal an important feature of the original model; the 
same holds true for motifs (in X\) and prior knowledge (in Y\). Moreover, assuming that this procedure has been done 
for a second existing model (specified by V 2 and Z 2 ), one may directly compare their simplified version in a common 
framework, which will help identify the assumptions required for their description. Note that this perspective is similar to 
a commutation diagram 

Difficult to compare 
(Vl,Zl) ' (V 2 ,Z 2 ) 

Equivalent behaviour Equivalent behaviour 

Comparable 

{W 1 ,X 1 ,Y 1 ) <— > (W 2 ,X 2 ,Y 2 ) 

For example, if X\ = X 2 and Y\ = Y 2 , we know that the discrepancies between the two original models is imputable to 
the difference in the transition events. Finding a minimal set of changes to W\ and/or W 2 causing both models to agree 
may then help identify the very cause of the discrepancies. 
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Abstract 

This document provides supplementary informations to "Markov processes on complex networks with applications 
to spreading dynamics" [1], which will hereafter be referred to as the "main text". Equation numbers from the main 
text are preceded by the letters "MT". 

1 State of the full system (Z) 

The state of the full system is specified by the intrinsic state of all of its components and the network structure governing 
their interactions. As defined in the main text [1] §11 A, a node has at any time a single intrinsic node state v. We note N 
the total number of accessible node states, which takes its minimal value N = 1 when nodes are intrinsically indiscernible. 
In addition, we introduce the intrinsic link state t. In a similar way, C is the total number of accessible link states, which 
takes its minimal value C = 1 for intrinsically indiscernible links (C = 1 throughout the main text). Examples of link 
states may include any relevant characteristics of the interactions: their context (e.g., professional, friendship, partnership, 
. . . ), their dynamical status (e.g., active or inactive), their weight (e.g., strong, normal, weak, . . . ), etc. 

While each node (or link) may be in only one intrinsic state at any given time, different pieces of information may be 
encoded in this single state. As it is the case in standard compartmental models, a simple Cartesian product may suffice 
to this task. For example, an SIR dynamics (three epidemiological states, see main text [1] §111) where we discern two 
genders (male and female) would result in a total of N = 6 accessible node states. Note that there is no particular problem 
caused by the fact that one of these characteristics is susceptible to change during the process while the other one remains 
constant. 

In addition to intrinsic link states, links may (or may not) be directed. The links of the networks discussed in the 
main text [1] are undirected: the interaction between the two linked nodes is bidirectional and symmetric. Directed links 
represent interactions that are either unidirectional or asymmetrical. A network that has only undirected links is said to 
be undirected, one that has only directed links is directed and one that has both is semi-directed. 

2 More on motifs 

We introduce new motifs and generalize those presented in the main text [1] for intrinsic link types and directed (or 
semi-directed) networks. Table 1 summarizes the total number of motifs in each of these classes. 

Of high practical importance is the fact that the entries of this table differ widely in their scaling behaviours. For 
example, the on-the-fly model [2] for two interacting SIR dynamics each propagating on their own network structure uses 
N = 9 (two SIR), C = 3 (links of the first network alone, links of the second network alone and overlapping links) and 
V = 1 (undirected network). Looking up in table 1, we see that this requires of the order of IC 3 on-the-fly (degreed node) 
motifs, where K denotes the highest accessible node degree. By opposition, implementing a first neighbourhood version 
of this model would require of the order of K, 27 motifs! 
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Table 1: Number of motifs in selected classes. We defined V = 1 for undirected, V = 2 for directed and V = 3 for 
semi-directed networks. Moreover, /C denotes the highest accessible node degree. 



Motifs 



Number 



Node 
Pair 
Triple 
Triangle 
Degreed node 

Degreed pair 

First neighbourhood (node) 
First neighbourhood pair 

Second neighbourhood (node) 



AT 

\CN{VN +{V-2f) 
\VCN 2 (VCN +1) 
\VCM+\v{{V-2)CM) 2 +\{VCMf 

(K+VC\ 



CM 
2 



M{ K+V K CN ) 



CM 
2 



N 



K, 



2.1 Pair motifs 

We note v-v' (resp. v\v') an undirected (resp. directed) pair motif formed of a state v node linked through a state I link 
to a state v' node. In the case of directed motifs, the direction of the arrow usually specifies the "strongest causal effect" 
of this asymmetric interaction (although this needs not be the case). All the undirected and directed motifs are possible in 
a semi-directed network. We may omit the index I over the links when C = 1. 

2.2 Triple motifs 

We note v-v 1 - v" an undirected triple motif formed of a state v node linked through a state £ link to a state v' node itself 
linked through a state £' link to a state v" node. As for pair motifs, each of these nodes may have other neighbours than 

those that are explicitly specified. The notation directly generalizes to directed (e.g., v—>v<—v) and semi-directed (e.g., 

p i £' a 
v-v — >v ) triple motifs. 

The term "2-star" is often used to refer to a triple motif for which both extremities (e.g., the nodes of state v and v" 
in the motif v-v' -v") are explicitly forbidden to be neighbours. In models that also use triangle motifs, 2-star motifs 
may explicit the absence of the last link that would form a triangle. Another common use of triple motifs comes in the 
inference process of (usually deterministic) pair based models. 

2.3 Triangle motifs and other small subnetworks 

Three nodes that are all neighbours of each other form a triangle motif. An horizontal bracket represents the additional 
link that would be missing in the analogous triple motif, e.g., 

v"-v' ( -v" 

t" 

for an undirected network. 

Triangle motifs are usually considered in models that should account for clustering. Their number may either directly 
be tracked in the state vector x [3] or their implicit presence (stated in Y, e.g., through a clustering coefficient) may be 
accounted for in the inference process [4]. 

The same notation may be generalized to other motifs consisting of small subnetworks, e.g., square motifs [3]. 
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2.4 Clique motifs 



A vague definition of a clique motif is "a subgroup of nodes that share more links among themselves than what could 
be expected otherwise for the same number of randomly selected nodes". In applications, one may refine this definition 
according to the specificities of the problem at hand, e.g., "an Erdos-Renyi subnetwork (link probability p) of ns suscep- 
tible nodes and nj infectious nodes". Clique motifs are usually considered in models that should account for community 
structure [5]. 

2.5 Degreed motifs 

A degreed motif is a motif for which we know the degree of all the nodes forming the motif: a degreed node motif is a 
node of specified state and degree; a degreed link motif is two nodes of specified state and degree that are known to be 
neighbours; etc. The in-degree and out-degree are both specified in directed and semi-directed networks; the latter cases 
also specify undirected degrees. Likewise, degrees pertaining to different types of links are specified independently. 

In the same way that pair motifs are usually combined with node motifs, degreed pair motifs are usually combined with 
degreed node motifs [6]. Note that the on-the-fly motifs vh\{n) presented in the main text [1] §111 D can be understood 
as a special case of degreed node motifs where the degree is replaced by the "degree to unknown nodes". 

2.6 n-th neighbourhood motifs 

Similarly to degreed motifs, an n-th neighbourhood motif is a motif for which we specify the state of all the n-th neigh- 
bours of the nodes forming the original motif. Hence, the notation ^li(k) [resp. z/I^K)] of the main text corresponds to a 
first (resp. second) neighbourhood node motif. These concepts are directly generalizable to types of links and to directed 
or semi-directed networks. 

First neighbourhood node motifs can be understood as tracking the correlation between the state of a node and the 
state of all its neighbours. By opposition, degreed pair motifs track the correlation between the state and degree of two 
neighbouring nodes. While similar information may be obtained from both motif classes, a model based on one may 
perform better than a model based on the other depending on the characteristics of the full system to be modelled. 

2.7 Coarse-grained degree and/or neighbourhood 

Not all entries of table 1 depend on the maximal degree /C, but those that do quickly increase for large K. This is 
problematic since many real-world systems contain high degree nodes. 

However, one may overcome this limitation by coarse-graining degrees through ranges: the range containing the 
degree of a node is specified instead of the degree itself. For example, given the ranges 



we would say of a degree 23 node that its degree lies within range 5. Hence, for the purpose of evaluating the number 
of motifs in table 1, one should here use K. = 6 (one less than the total number of ranges) instead of K. = 63 (highest 
representable degree). 

While the previous example used powers of 2 for simplicity, a slower increase is probably desirable in most appli- 
cations. However, since the number of neighbourhood and degreed motifs strongly depends on /C, even the slightest 
reduction in this number may be significant. Note that this coarse-graining method is of particular interest when the 
real-world data used to calibrate the model is already coarse-grained, which is commonly the case for census data. 

3 Deterministic solution of on-the-fly SIR model 

This section provides the deterministic solution of (MT10) from the main text [1]. Our result is the same one as for the 
SIR pair model presented in [7, 8]. 



[0, 0] , [1, 1] , [2, 3] , [4, 7] , [8, 15], [16, 31] and [32, 63] 




range range 1 range 2 range 3 range 4 range 5 



range 6 
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We first rewrite (MT3) for the specific case of (MT10) 
d/j,(t) 



dt 

V K K' 



then collect the contributions to fi s\i(k) > MJAi(re) an d ^ra-^k) (dropping the functional dependencies for brevity) 

— ^ — - Z> ^xsrere' ^ 

re' 

= EI ( 9 ZS(/c+l)/c' + 2x7(re+l)re' ~ ?Z7rere') + EI EI ( 9 Zi/re'(re+l) ~~ IxIk'k) ~ QKk ( 2b ) 
re' v re' 

( ^T < ^ = EI ( 9 ZH(k;+1)k/ ~ QtRkk') + • ( 2c ) 

re' 

Using the definitions 

A = EI k ^^ai(«) and ^ = EE EE k ^aim 

re v n 

where A is the total number of stubs belonging to infectious nodes and u; is the total number of stubs in the system, (2) 
becomes 

(4a) 

- aM/A^) + ^ ( X + ^) (( K + 1 )/ i -fA 1 (K+i) - «M/Ai( K )) (4b) 

^ = — (fa + l)/iflA!(«+l) - «MaAi(k) ) + <*MjAi(k) ■ ( 4c ) 

Note that (MTlOf) has been approximated by dropping the Kronecker delta in the numerator and the —1 in the denomi- 
nator. 

We now consider the evolution of the total number of stubs co by summing the contributions from (4) 

^ = yy K d J^M = _ 2px . (5) 

dt dt H 

v re 

One may understand (5) as "during the time interval [t, t + dt), each one of the A stubs belonging to infectious nodes have 
probability (3dt to be paired to another stub, thus causing a decrease by 2 of ui. Noting uq = u(0) the total number of 
stubs in the initial condition, we introduce the change of variable 

= . — such that — = - ^— (6) 

V LOq dt 0LOQ 



dt oj 
dfJ-lA^K) _ /?A(k + l)M5Ai(re+l) 
dt uj 



Notice that t = corresponds to 9 = 1 and that 6 decreases with time. Using this change of variable in (4a) gives 

d^SA^K) _ Kf^SA^K) 

d9 ~ 8 

which has the solution 



(7) 



M5A l( re)(i)=ZSA l( re)(O)(0(i)r • (8) 

using the initial condition /i 5Al ( K )(0) = x SAi(k) (0). 
For convenience, we define 

f(0) = J2 x sA l{K) (o)e K . (9) 
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Noticing that 

£ «(« + 1 = 2 J> - l)^ 5Al(K) (0)^- 2 = 9 2 f" {&) , (10) 



K K 



we obtain the evolution of the total number A of stubs belonging to infectious nodes by summing the contributions (4b) 

at ^— ' at w V a; / 



Again using the change of variable (6), we get 

dX 



<1X A + ft* f 1 + - */"(*) d2) 



which has the solution 

A = # 2 r,;n 114- 

f3j u /3 

for an initial condition without removed nodes [i.e., A(0) = ujq — f'(l)] 
Using this solution in (6) gives 



X = e 2 uj + % ) -ft*" -0f(0) (13) 



whose solution provides 6{t). Using 



f = -pe + a(i-e) + ^ d4) 

at wo 

= /(*(*)) (15a) 
I(t) = N - S(t) - R(t) (15b) 

^=oJ(t) , (15c) 

we finally obtain the total number S = Y^, K A*sAi(k) °f susceptible, I = 2~^ K Hia^k) °f infectious and i? = ^ K Ura^k) 
of removed nodes at any given time t. A direct application of (8)-(9) provides (15a), conservation of the nodes provides 
(15b) and using the definitions of I(t) and R(t) in (4c) provides (15c). Although obtained differently, this solution 
corresponds to that of the pair-based SIR model presented in [7, 8]. 



4 Alternative form of on-the-fly SIR model 

The itTi(k) motifs were included in (MTlOa) for the sake of clarity alone: noting s an unmatched stub motif, we could 
rewrite (MT10) with a state vector two thirds the size of (MTlOa) 



x 



> x SAi(k)> " " " > x /Ai(k)> • • • ,X S ) 



y xSkk = e /Ai(fi ._ 1) - e 5Al(K ) + e/ Al (/c'-i) - e /Al ( K /) - 2e s 



TIhk' 



e/Ai(«-i) - %Ai( K ) + e/Ai(K'-i) - e 7 Ai(K') ~ 



XRkk' 

r = e /Ai( K '-l) - e /Ai( K ') - * e s 
+ f V\ « Al ' K) KX SAjn) 

3zJ«ie'( x > y ) =0« x IM(k') — I 

»s - 1 - X>" («SAi(k") + Z/Ai(k")) 
?Za«c' ( X ' Y)=0k! X IAi{k , } - — — 

^«( x ' y ) = ax IAi(K) 

w(x,n = ^ K (x,y) = o . 

Note that x s plays the same role as oj in Sec. 3. Proceeding similarly, the state vector x = (• • • , £sai(k)> • • • , x s ) would 
suffice for a SI model (i.e., a = 0) [9]. 



5 



References 

[1] Noel PA, Allard A, Hebert-Dufresne L, Vincent M, Dube LJ. Markov processes on complex networks with applica- 
tions to spreading dynamics. Exact reference to be inserted by editors. 2011;. 

[2] Marceau V, Noel PA, Hebert-Dufresne L, Allard A, Dube LJ. Modeling the dynamical interaction between epidemics 
on overlay networks. Phys Rev E. 2011;84(2):026105. 

[3] House T, Davies G, Danon L, Keeling MJ. A Motif-Based Approach to Network Epidemics. Bull Math Biol. 
2009;71:1693-1706. 

[4] Keeling MJ, Rand DA, Morris AJ. Correlation models for childhood epidemics. Proc R Soc B. 1997;264(1385):1 149- 
1156. 

[5] Hebert-Dufresne L, Noel PA, Marceau V, Allard A, Dube LJ. Propagation dynamics on networks featuring complex 
topologies. Phys Rev E. 2010;82(3):036115. 

[6] Eames KTD, Keeling MJ. Modeling dynamic and network heterogeneities in the spread of sexually transmitted 
diseases. PNAS. 2002;99:13330-13335. 

[7] Volz E. SIR dynamics in random networks with heterogeneous connectivity. J Math Biol. 2008;56(3):293-310. 

[8] Miller JC. A note on a paper by Erik Volz: SIR dynamics in random networks. J Math Biol. 2010;62(3):349-358. 

[9] Noel PA, Allard A, Hebert-Dufresne L, Vincent M, Dube LJ. Propagation on networks: an exact alternative perspec- 
tive, e-print arXiv: 11020987. 201 l;Submitted for publication. 



6 



